{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Continuous training with TFX and Google Cloud AI Platform"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning Objectives\n",
    "\n",
    "1.  Use the TFX CLI to build a TFX pipeline.\n",
    "2.  Deploy a new TFX pipeline version with tuning enabled to a hosted AI Platform Pipelines instance.\n",
    "3.  Create and monitor a TFX pipeline run using the TFX CLI and KFP UI."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, you use utilize the following tools and services to deploy and run a TFX pipeline on Google Cloud that automates the development and deployment of a TensorFlow 2.3 WideDeep Classifer to predict forest cover from cartographic data:\n",
    "\n",
    "* The [**TFX CLI**](https://www.tensorflow.org/tfx/guide/cli) utility to build and deploy a TFX pipeline.\n",
    "* A hosted [**AI Platform Pipeline instance (Kubeflow Pipelines)**](https://www.tensorflow.org/tfx/guide/kubeflow) for TFX pipeline orchestration.\n",
    "* [**Dataflow**](https://cloud.google.com/dataflow) jobs for scalable, distributed data processing for TFX components.\n",
    "* A [**AI Platform Training**](https://cloud.google.com/ai-platform/) job for model training and flock management for parallel tuning trials. \n",
    "* [**AI Platform Prediction**](https://cloud.google.com/ai-platform/) as a model server destination for blessed pipeline model versions.\n",
    "* [**CloudTuner**](https://www.tensorflow.org/tfx/guide/tuner#tuning_on_google_cloud_platform_gcp) and [**AI Platform Vizier**](https://cloud.google.com/ai-platform/optimizer/docs/overview) for advanced model hyperparameter tuning using the Vizier algorithm.\n",
    "\n",
    "You will then create and monitor pipeline runs using the TFX CLI as well as the KFP UI."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Update lab environment PATH to include TFX CLI and skaffold"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml\n",
    "\n",
    "# Set `PATH` to include the directory containing TFX CLI and skaffold.\n",
    "PATH=%env PATH\n",
    "%env PATH=/home/jupyter/.local/bin:{PATH}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Validate lab package version installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -c \"import tensorflow; print('TF version: {}'.format(tensorflow.__version__))\"\n",
    "!python -c \"import tfx; print('TFX version: {}'.format(tfx.__version__))\"\n",
    "!python -c \"import kfp; print('KFP version: {}'.format(kfp.__version__))\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: this lab was built and tested with the following package versions:\n",
    "\n",
    "`TF version: 2.3.2`  \n",
    "`TFX version: 0.25.0`  \n",
    "`KFP version: 1.4.0`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(Optional) If running the above command results in different package versions or you receive an import error, upgrade to the correct versions by running the cell below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install --upgrade --user tensorflow==2.3.2\n",
    "%pip install --upgrade --user tfx==0.25.0\n",
    "%pip install --upgrade --user kfp==1.0.4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: you may need to restart the kernel to pick up the correct package versions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Validate creation of AI Platform Pipelines cluster\n",
    "\n",
    "Navigate to [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console.\n",
    "\n",
    "Note you may have already deployed an AI Pipelines instance during the Setup for the lab series. If so, you can proceed using that instance. If not:\n",
    "\n",
    "**1.  Create or select an existing Kubernetes cluster (GKE) and deploy AI Platform**. Make sure to select `\"Allow access to the following Cloud APIs https://www.googleapis.com/auth/cloud-platform\"` to allow for programmatic access to your pipeline by the Kubeflow SDK for the rest of the lab. Also, provide an `App instance name` such as \"tfx\" or \"mlops\". \n",
    "\n",
    "Validate the deployment of your AI Platform Pipelines instance in the console before proceeding."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Review: example TFX pipeline design pattern for Google Cloud\n",
    "The pipeline source code can be found in the `pipeline` folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%cd pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -la"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `config.py` module configures the default values for the environment specific settings and the default values for the pipeline runtime parameters. \n",
    "The default values can be overwritten at compile time by providing the updated values in a set of environment variables. You will set custom environment variables later on this lab."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `pipeline.py` module contains the TFX DSL defining the workflow implemented by the pipeline.\n",
    "\n",
    "The `preprocessing.py` module implements the data preprocessing logic  the `Transform` component.\n",
    "\n",
    "The `model.py` module implements the training, tuning, and model building logic for the `Trainer` and `Tuner` components.\n",
    "\n",
    "The `runner.py` module configures and executes `KubeflowDagRunner`. At compile time, the `KubeflowDagRunner.run()` method converts the TFX DSL into the pipeline package in the [argo](https://argoproj.github.io/argo/) format for execution on your hosted AI Platform Pipelines instance.\n",
    "\n",
    "The `features.py` module contains feature definitions common across `preprocessing.py` and `model.py`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise: build your pipeline with the TFX CLI\n",
    "\n",
    "You will use TFX CLI to compile and deploy the pipeline. As explained in the previous section, the environment specific settings can be provided through a set of environment variables and embedded into the pipeline package at compile time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure your environment resource settings\n",
    "\n",
    "Update  the below constants  with the settings reflecting your lab environment. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `GCP_REGION` - the compute region for AI Platform Training, Vizier, and Prediction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `ARTIFACT_STORE` - An existing GCS bucket. You can use any bucket or use the GCS bucket created during installation of AI Platform Pipelines. The default bucket name will contain the `kubeflowpipelines-` prefix."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* `CUSTOM_SERVICE_ACCOUNT` - In the gcp console Click on the Navigation Menu. Navigate to `IAM & Admin`, then to `Service Accounts` and use the service account starting with prefix - `'tfx-tuner-caip-service-account'`. This enables CloudTuner and the Google Cloud AI Platform extensions Tuner component to work together and allows for distributed and parallel tuning backed by AI Platform Vizier's hyperparameter search algorithm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `ENDPOINT` - set the `ENDPOINT` constant to the endpoint to your AI Platform Pipelines instance. The endpoint to the AI Platform Pipelines instance can be found on the [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console. Open the *SETTINGS* for your instance and use the value of the `host` variable in the *Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD* section of the *SETTINGS* window. The format is `'...pipelines.googleusercontent.com'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT_ID = !(gcloud config get-value core/project)\n",
    "PROJECT_ID = PROJECT_ID[0]\n",
    "GCP_REGION = 'us-central1'\n",
    "ARTIFACT_STORE_URI = f'gs://{PROJECT_ID}-kubeflowpipelines-default'\n",
    "CUSTOM_SERVICE_ACCOUNT = f'tfx-tuner-caip-service-account@{PROJECT_ID}.iam.gserviceaccount.com'\n",
    "\n",
    "#TODO: Set your environment resource settings here for ENDPOINT.\n",
    "ENDPOINT = ''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set your resource settings as environment variables. These override the default values in pipeline/config.py.\n",
    "%env GCP_REGION={GCP_REGION}\n",
    "%env ARTIFACT_STORE_URI={ARTIFACT_STORE_URI}\n",
    "%env CUSTOM_SERVICE_ACCOUNT={CUSTOM_SERVICE_ACCOUNT}\n",
    "%env PROJECT_ID={PROJECT_ID}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set the compile time settings to first create a pipeline version without hyperparameter tuning\n",
    "\n",
    "Default pipeline runtime environment values are configured in the pipeline folder `config.py`. You will set their values directly below:\n",
    "\n",
    "* `PIPELINE_NAME` - the pipeline's globally unique name. For each pipeline update, each pipeline version uploaded to KFP will be reflected on the `Pipelines` tab in the `Pipeline name > Version name` dropdown in the format `PIPELINE_NAME_datetime.now()`.\n",
    "\n",
    "* `MODEL_NAME` - the pipeline's unique model output name for AI Platform Prediction. For multiple pipeline runs, each pushed blessed model will create a new version with the format `'v{}'.format(int(time.time()))`.\n",
    "\n",
    "* `DATA_ROOT_URI` - the URI for the raw lab dataset `gs://workshop-datasets/covertype/small`.\n",
    "\n",
    "* `CUSTOM_TFX_IMAGE` - the image name of your pipeline container build by skaffold and published by `Cloud Build` to `Cloud Container Registry` in the format `'gcr.io/{}/{}'.format(PROJECT_ID, PIPELINE_NAME)`.\n",
    "\n",
    "* `RUNTIME_VERSION` - the TensorFlow runtime version. This lab was built and tested using TensorFlow `2.3`.\n",
    "\n",
    "* `PYTHON_VERSION` - the Python runtime version. This lab was built and tested using Python `3.7`.\n",
    "\n",
    "* `USE_KFP_SA` - The pipeline can run using a security context of the GKE default node pool's service account or the service account defined in the `user-gcp-sa` secret of the Kubernetes namespace hosting Kubeflow Pipelines. If you want to use the `user-gcp-sa` service account you change the value of `USE_KFP_SA` to `True`. Note that the default AI Platform Pipelines configuration does not define the `user-gcp-sa` secret.\n",
    "\n",
    "* `ENABLE_TUNING` - boolean value indicating whether to add the `Tuner` component to the pipeline or use hyperparameter defaults. See the `model.py` and `pipeline.py` files for details on how this changes the pipeline topology across pipeline versions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PIPELINE_NAME = 'tfx_covertype_continuous_training'\n",
    "MODEL_NAME = 'tfx_covertype_classifier'\n",
    "DATA_ROOT_URI = 'gs://workshop-datasets/covertype/small'\n",
    "CUSTOM_TFX_IMAGE = 'gcr.io/{}/{}'.format(PROJECT_ID, PIPELINE_NAME)\n",
    "RUNTIME_VERSION = '2.3'\n",
    "PYTHON_VERSION = '3.7'\n",
    "USE_KFP_SA=False\n",
    "ENABLE_TUNING=True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env PIPELINE_NAME={PIPELINE_NAME}\n",
    "%env MODEL_NAME={MODEL_NAME}\n",
    "%env DATA_ROOT_URI={DATA_ROOT_URI}\n",
    "%env KUBEFLOW_TFX_IMAGE={CUSTOM_TFX_IMAGE}\n",
    "%env RUNTIME_VERSION={RUNTIME_VERSION}\n",
    "%env PYTHON_VERIONS={PYTHON_VERSION}\n",
    "%env USE_KFP_SA={USE_KFP_SA}\n",
    "%env ENABLE_TUNING={ENABLE_TUNING}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compile your pipeline code\n",
    "\n",
    "You can build and upload the pipeline to the AI Platform Pipelines instance in one step, using the `tfx pipeline create` command. The `tfx pipeline create` goes through the following steps:\n",
    "- (Optional) Builds the custom image to that provides a runtime environment for TFX components or uses the latest image of the installed TFX version \n",
    "- Compiles the pipeline code into a pipeline package \n",
    "- Uploads the pipeline package via the `ENDPOINT` to the hosted AI Platform instance.\n",
    "\n",
    "As you debug the pipeline DSL, you may prefer to first use the `tfx pipeline compile` command, which only executes the compilation step. After the DSL compiles successfully you can use `tfx pipeline create` to go through all steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!tfx pipeline compile --engine kubeflow --pipeline_path runner.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: you should see a `{PIPELINE_NAME}.tar.gz` file appear in your current pipeline directory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise: deploy your pipeline container to AI Platform Pipelines with TFX CLI\n",
    "\n",
    "After the pipeline code compiles without any errors you can use the `tfx pipeline create` command to perform the full build and deploy the pipeline. You will deploy your compiled pipeline container hosted on Google Container Registry e.g. `gcr.io/[PROJECT_ID]/tfx_covertype_continuous_training` to run on AI Platform Pipelines with the TFX CLI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Your code here to use the TFX CLI to deploy your pipeline image to AI Platform Pipelines.\n",
    "\n",
    "!tfx pipeline create  \\\n",
    "--pipeline_path=runner.py \\\n",
    "--endpoint={ENDPOINT} \\\n",
    "--build_target_image={CUSTOM_TFX_IMAGE}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Hint**: review the [TFX CLI documentation](https://www.tensorflow.org/tfx/guide/cli#create) on the \"pipeline group\" to create your pipeline. You will need to specify the `--pipeline_path` to point at the pipeline DSL and runner defined locally in `runner.py`, `--endpoint`, and `--build_target_image` arguments using the environment variables specified above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: you should see a `build.yaml` file in your pipeline folder created by skaffold. The TFX CLI compile triggers a custom container to be built with skaffold using the instructions in the `Dockerfile`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you need to redeploy the pipeline you can first delete the previous version using `tfx pipeline delete` or you can update the pipeline in-place using `tfx pipeline update`.\n",
    "\n",
    "To delete the pipeline:\n",
    "\n",
    "`tfx pipeline delete --pipeline_name {PIPELINE_NAME} --endpoint {ENDPOINT}`\n",
    "\n",
    "To update the pipeline:\n",
    "\n",
    "`tfx pipeline update --pipeline_path runner.py --endpoint {ENDPOINT}`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create and monitor a pipeline run with the TFX CLI\n",
    "\n",
    "After the pipeline has been deployed, you can trigger and monitor pipeline runs using TFX CLI.\n",
    "\n",
    "*Hint*: review the [TFX CLI documentation](https://www.tensorflow.org/tfx/guide/cli#run_group) on the \"run group\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: your code here to trigger a pipeline run with the TFX CLI\n",
    "\n",
    "!tfx run create --pipeline_name={PIPELINE_NAME} --endpoint={ENDPOINT}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To view the status of existing pipeline runs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!tfx run list --pipeline_name {PIPELINE_NAME} --endpoint {ENDPOINT}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To retrieve the status of a given run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "RUN_ID='[YOUR RUN ID]'\n",
    "\n",
    "!tfx run status --pipeline_name {PIPELINE_NAME} --run_id {RUN_ID} --endpoint {ENDPOINT}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Important\n",
    "\n",
    "A full pipeline run with tuning enabled will take about 50 minutes to complete. You can view the run's progress using the TFX CLI commands above and in the KFP UI. While the pipeline run is in progress, there are also optional exercises below to explore your pipeline's artifacts and Google Cloud integrations while the pipeline run is in progress."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise (optional): review your pipeline's Dataflow jobs for data processing\n",
    "\n",
    "On the [Dataflow](https://console.cloud.google.com/dataflow) page, inspect the computational graphs parallelizing the data processing in the ExampleGen, StatisticsGen, Transform, and Evaluator pipeline components."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise (optional): review your pipeline's AI Platform jobs for model training and hyperparameter tuning\n",
    "\n",
    "On the [AI Platform Jobs](https://console.cloud.google.com/ai-platform/jobs) page, inspect the Training job logs. You can also launch a Tensorboard server directly from the KFP UI to monitor your training performance while the job is in progress. Click on the Trainer component in the KFP UI once it is running and navigate to the `Visualizations` tab. Scroll down to the Tensorboard widget and hit the `Open Tensorboard` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise (optional): review your pipeline's artifacts on Cloud Storage\n",
    "\n",
    "On the [Cloud Storage page](https://console.cloud.google.com/storage), review how TFX standardizes the organization of your pipeline run artifacts. You will find them organized by component and versioned in your `gs://{PROJECT_ID}-kubeflowpipelines-default` artifact storage bucket. This standardization brings reproducibility and traceability to your ML workflows and allows for easier reuse of pipeline components and artifacts across use cases."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Steps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, you learned how to build and deploy a TFX pipeline with the TFX CLI and then update, build and deploy a new pipeline with automatic hyperparameter tuning. You practiced triggered continuous pipeline runs using the TFX CLI as well as the Kubeflow Pipelines UI.\n",
    "\n",
    "\n",
    "In the next lab, you will construct a Cloud Build CI/CD workflow that further automates the building and deployment of the TensorFlow WideDeep Classifer pipeline code introduced in this lab."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## License"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font size=-1>Licensed under the Apache License, Version 2.0 (the \\\"License\\\");\n",
    "you may not use this file except in compliance with the License.\n",
    "You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)\n",
    "\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \\\"AS IS\\\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>"
   ]
  }
 ],
 "metadata": {
  "environment": {
   "name": "tf2-gpu.2-3.m65",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-3:m65"
  },
  "kernelspec": {
   "display_name": "Python [conda env:root] *",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
